Simulation of the Hands-free Speech Input to Speech Recognition Systems by Measuring Room Impulse Responses

نویسندگان

  • Hans-Günter Hirsch
  • Andreas Kitzig
  • Klaus Linhard
چکیده

A hardware and software approach is presented in this paper to measure the room impulse response that defines the transmission of audio signals in a room. This approach was developed within the European SpeeCon project [1]. Graphical user interfaces have been designed to estimate the room impulse response from the recordings of noise signals that are transmitted in the room and to analyse the impulse response with respect to the reverberation time and the corresponding frequency response. The impulse response can be taken to artificially create speech data that contain the influence of a hands-free speech input in a room. We used these speech data to investigate the performance degradation of a speech recognition system for this acoustic input condition. A few exemplary results are presented. 1 Measurement of room impulse response The goal of the European SpeeCon project [1] was the collection of speech data for different languages with a focus on recording speech utterances in hands-free mode inside rooms. This should support the development of recognition systems that allow e.g. the control of electronic devices by a speech input in hands-free mode. To measure the acoustic condition in each individual recording session the hardware set-up shown in figure 1 has been developed. The intention is an estimation of the room impulse response that can be used to describe the transmission of an audio signal in a room. With the impulse response it is possible to individually analyse the acoustic condition of each recording session. Furthermore, speech data can be artificially created that contain the effect of a hands-free speech input in this specific situation. A pink noise and a maximum length sequence (MLS) are played back from a CD player via a loudspeaker. Instead of the usually applied white noise a pink noise with an energy distribution that decreases to higher frequencies is used to compensate the frequency characteristics of the small loudspeaker. The noise signals are recorded with two microphones and stored on a PC as digital signals at a sampling rate of 16 kHz. One microphone is close to the loudspeaker. The second microphone is placed at the desired position in the room where we want to measure the impulse response. An impulse response could be estimated by comparing the recorded signal at microphone 2 with the noise signal as it is stored on the CD player. But in this case the estimated impulse response would also include the transmission characteristics of the loudspeaker. This can be avoided by the second recording close to the loudspeaker. Then, the two microphone signals can be taken to estimate the transmission characteristics between the microphones. We apply two approaches to determine the impulse response either from the recordings of the pink noise or from the recorded MLS sequences. In case of the pink noise we can estimate the power density spectrum for each of the two microphone signals. E.g. the Welch method can be applied where the noise signal is split into segments. For each segment the spectrum is calculated with a DFT. The power density spectrum is determined as average spectrum over all segments. The ratio of the power density spectrum from microphone M2 versus the corresponding spectrum of M1 leads to an estimation of the room transfer function. A MLS sequence has the interesting property that its autocorrelation function is approximately a Dirac impulse. This is especially true for long MLS sequences. We are using a MLS sequence of length 16383. The signal recorded by microphone M2 can be described as the convolution of the signal recorded by microphone M1 and the room impulse response. Figure 1: Hardware set-up to measure the impulse response of a room M1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition

This paper reports on a project for collection of the sound scene data. The sound scene data is necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustical environments. There are many kinds of sound scenes in real environments. The sound scene is denoted by sound sources and room acoustics. The number of combi...

متن کامل

Data collection in real acoustical environments for sound scene understanding and hands-free speech recognition

This paper describes a sound scene database necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustical environments. This paper reports on a project for collection of the sound scene data supported by Real World Computing Partnership(RWCP). There are many kinds of sound scenes in real environments. The sound s...

متن کامل

Real Environment Acoustic Database

Recently importance of hands-free speech communication is increasingly recognized. The sound data for open evaluation is necessary for the studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project for the acoustic data collection. There are many kinds of sounds in real environm...

متن کامل

CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments

In this paper, we newly introduce a collection of databases and evaluation tools called CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a handsfree speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition in various environments. The data con...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010